Team Project - Unit 6
Multi-Agent Email Forensics System

Team: Group F

Team Members: Andrea Trevisi, Fabian Narel, Pavlos Papachristos

Project Title: AI Fraudulent Mail Detection System

Module: Intelligent Agents

Submission: October 2025

Introduction

In today's digital environment, email is the most exploited media platform for cyber threats such as phishing, data theft, and corporate espionage (Verizon, 2024). The scale and speed of email communications in modern organizations make manual forensic review both impractical and ineffective.

This report proposes the development and implementation of an automated, modular system for email forensic analysis. The system is designed using a Multi-Agent System (MAS) architecture, where distinct, autonomous agents collaborate to perform a comprehensive forensic workflow that includes in sequence: a) data generation, b) suspected phrases discovery, c) results analysis, d) visualization, and e) reporting.

This document outlines the system's technical requirements, key design decisions, and the underlying rationale supported by academic principles. It presents graphical models of the system's architecture, discusses anticipated challenges, and proposes mitigation strategies. The goal is to deliver a business-ready proposal for a robust, scalable, and interpretable email forensics tool that can enhance an organization's security posture.

1. System Requirements

The system is developed in Python 3 and relies on a suite of standard libraries for data science and visualization. No specialized hardware is necessary.

Core Libraries

  • os, glob, datetime, random, collections - For fundamental operations such as file system interaction, date/time handling, and data structuring

Data Handling

  • pandas - Essential for structuring the email data into DataFrames, which facilitates efficient statistical analysis and manipulation required by the DashboardAgent

Visualization

  • matplotlib - Primary plotting library for creating static charts and graphs
  • seaborn - Used for advanced visualizations such as the activity heatmap
  • wordcloud - To generate a word cloud from suspicious email subjects, utilising prominent keywords

Reporting

  • jinja2 - Templating engine to dynamically generate the comprehensive HTML report, embedding analysis results and visualizations into a structured, professional format

2. System Design and Rationale

Multi-Agent System Architecture

The system's design is grounded in a modular, agent-based methodology. This approach is chosen to enhance maintainability, scalability and to provide the needed clarity of the methodology framework.

A 'Multi-Agent System' (MAS) is a framework in which autonomous computational entities, or agents, interact to solve problems that are beyond their individual capabilities (Wooldridge, 2009). The MAS framework is well-suited to analyse complex and multi-stage tasks like digital forensics (Al-Amri & Watson, 2021).

The forensic workflow is executed sequentially and it is orchestrated by a main controller that activates each agent in turn.

Data Models

Two primary dataclasses structure the system's data:

SimpleEmail: Represents an email with fields: id, subject, sender, recipient, date, content, file_path. Includes methods:

  • is_suspicious() - Checks for suspicious keywords in subject/content
  • is_after_hours() - Determines if sent outside business hours (8 AM-6 PM)
  • is_external() - Checks if sender is from external domain

Finding: Represents an investigation finding with fields: finding_type, description, email_id, severity, timestamp

Multi-Agent System Workflow

The forensic workflow is executed sequentially, orchestrated by a main controller that activates each agent in turn:

Step 1: EnhancedEmailGenerator (with Suspicious Subject Generator)

The process begins with the EnhancedEmailGenerator, which creates a realistic set of test data. This agent is vital for validation, producing both benign emails and suspicious ones with sophisticated subjects crafted by a dedicated SuspiciousSubjectGenerator.

Implementation: Generates 50 emails (30% suspicious rate)

Features:

  • Realistic normal subjects: "Weekly team meeting agenda", "Project status update", "Q3 budget review"
  • Realistic suspicious subjects: "URGENT: Account verification required", "CRITICAL: Security breach detected"
  • Combinatorial subject generation for variety
  • After-hours timestamps for suspicious emails (2-7 AM, 10 PM-11 PM)
  • Business hours timestamps for normal emails (8 AM-6 PM)
  • Team member emails: pavlos.papachristos@Group-F.com, fabian.narel@Group-F.com, andrea.trevisi@Group-F.com
  • External suspicious domains: phishing-site.com, suspicious-bank.net, lottery-scam.org

Step 2: DiscoveryAgent

The generated data is located and processed by the DiscoveryAgent. This agent simulates the initial stage of a forensic investigation by identifying and collecting evidence from the file system, loading it into structured SimpleEmail objects for processing.

Implementation:

  • Searches output/emails/ directory for .txt files
  • Parses email files extracting ID, Subject, From, To, Date, Content
  • Loads emails into SimpleEmail dataclass objects
  • Decouples data acquisition from analytical processes (Fowler, 2018)

Step 3: AnalysisAgent

The AnalysisAgent performs the main analytical tasks. It employs a multi-faceted detection strategy that moves beyond simple keyword matching to analyse emails based on keywords, timing, communication patterns, and combinations of these factors. This layered approach provides context-aware analysis.

Four Analysis Methods:

  1. Keyword Analysis: Detects suspicious words like 'confidential', 'urgent', 'verify', 'bitcoin', 'phishing', 'winner', 'inheritance'
  2. Timing Analysis: Flags emails sent after business hours (before 8 AM or after 6 PM)
  3. External Communication Analysis: Identifies emails from external domains (not Group-F.com or internal.org)
  4. Volume Analysis: Detects unusual patterns in email frequency

Output: Finding objects with type, description, email_id, severity, and timestamp

Step 4: DashboardAgent

The DashboardAgent transforms raw data into a suite of eight distinct visualizations for rapid interpretation by security analysts.

Eight Visualizations Generated:

  1. Summary Statistics Chart (bar chart)
  2. Pie Chart (suspicious vs. normal email distribution)
  3. Histogram (finding types distribution)
  4. Word Cloud (from suspicious email subjects)
  5. Timeline Analysis (findings over time)
  6. Activity Heatmap (email patterns by hour/day)
  7. Network Analysis (sender-recipient relationships)
  8. Severity Distribution (high/medium/low risk findings)

Tools Used: matplotlib, seaborn, wordcloud libraries

Output: PNG images saved to output/visualizations/

Step 5: ReportAgent

The ReportAgent consolidates all statistics, detailed findings, and visualizations into comprehensive text and HTML formats, creating a permanent, shareable record for archival, evidentiary, and executive communication purposes.

Report Components:

  • Executive summary with key statistics
  • Detailed findings list
  • Embedded visualizations
  • Recommendations for security team

Template Engine: Jinja2 for dynamic HTML generation

Output: forensics_report.html and text report in output/reports/

Step 6: UML Documentation Generation

Automatically generates UML diagrams documenting the system architecture:

  • Class Diagram: Shows static structure of agents and data models
  • Sequence Diagram: Shows dynamic interaction flow between agents

Output: PlantUML (.puml) files in output/uml_documentation/

3. Graphical System Designs

UML Diagrams

The system's architecture and behaviour are visualized using standard UML diagrams to provide clear, industry-standard documentation.

Figure 1: UML Class Diagram

Illustrates the static structure of the multi-agent system. It defines each agent as a class with its specific attributes and methods. The diagram illustrates how the classes interact with one another. The EnhancedEmailGenerator produces SimpleEmail objects, which are then examined by the AnalysisAgent, resulting in Finding records. This flow reflects the system's modular design and the interfaces between its components.

Figure 2: UML Sequence Diagram

Presents a dynamic view of the system, showing the sequence of interactions between the agents as orchestrated by the Main Controller. The flow begins with data generation and proceeds through discovery, analysis, and concludes with the dashboard and report generation. This diagram effectively visualizes the system's operational workflow from start to finish, confirming the logical progression of the forensic process.

Note: Full UML diagrams are included in the complete project report PDF.

Personal Reflection

Working with Andrea Trevisi and Pavlos Papachristos on this email forensics system taught me the practical value of multi-agent architectures. The modular design, where each agent had clearly defined responsibilities, enabled parallel development and made the system easier to maintain (Wooldridge, 2009).

Implementing the sequential orchestration revealed the importance of well-defined interfaces between components. Integration challenges between agents taught us that explicit interface specifications prevent costly rework (Fowler, 2018). The rule-based approach provided transparency essential for forensic analysis (Casey, 2011), though this came at the cost of adaptability compared to learning-based systems (Sommer & Paxson, 2010).

Team coordination across different time zones required disciplined communication, and version control conflicts taught us to establish clearer workflows early. This project effectively bridged theory and practice, making abstract concepts about agent coordination tangible through real implementation challenges.

Source Artifacts | 📄 Full Project Report (PDF)
← Back to Intelligent Agents Portfolio